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Abstract 

In this paper, the problem of opportunistic channel sensing and access in cognitive radio networks 
when the sensing is imperfect and a secondary user has limited traffic to send at a time is investigated. 
Primary users' statistical information is assumed to be unknown, and therefore, a secondary user needs to 
learn the information online during channel sensing and access process, which means learning loss, also 
referred to as regret, is inevitable. In this research, the case when all potential channels can be sensed 
simultaneously is investigated first. The channel access process is modeled as a multi-armed bandit 
problem with side observation. And channel access rules are derived and theoretically proved to have 
asymptotically finite regret. Then the case when the secondary user can sense only a limited number of 
channels at a time is investigated. The channel sensing and access process is modeled as a bi-level multi- 
armed bandit problem. It is shown that any adaptive rule has at least logarithmic regret. Then we derive 
channel sensing and access rules and theoretically prove they have logarithmic regret asymptotically and 
with finite time. The effectiveness of the derived rules is validated by computer simulation. 

Keywords - Cognitive radio; opportunistic channel access; bandit problem; channel exploration; channel 
exploitation. 

I. Introduction 

Cognitive radio has emerged as an effective solution to alleviate the spectrum shortage problem and 
improve spectrum efficiency. It has received tremendous research attentions recently [l]-[6]. In a cognitive 
radio network, opportunistic spectrum access (OSA) is used, in which the unlicensed users, referred to as 
secondary users, search for spectrum holes through sensing, and utilize the observed spectrum opportu- 
nities for their data transmission. Optimal OSA when the secondary users have statistical information of 
licensed users (referred to as primary users), such as information of free probabilities of primary channels, 
has been addressed in [7]-[l 1], to maximize transmission capacity, optimize transmission power efficiency, 
etc. However, research on the optimal OSA without a priori statistical knowledge of primary channels 
is still in its infancy. The research challenge is how to achieve the optimal tradeoff between channel 
exploration (the process to sense the channels so as to learn the statistical information) and channel 
exploitation (the process to utilize observed channel opportunities). If statistical information of primary 
channels is known in advance, a secondary user can select the optimal channels to sense and subsequently 
access sensed-free channels. However, without such information, a learning process is needed, and the 
secondary user should also explore suboptimal channels through sensing to learn statistical information 
of those channels. Therefore, learning loss is expected, compared to the case that the secondary user 
always selects the optimal channels. In the literature, the channel sensing and access process has been 
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modeled as a multi-armed bandit problem (MABP) [12]. For an MABP, the loss due to learning until time 
instant t is represented by the regret R(t), the difference between the actual reward of an arm-selection 
rule and the reward of a genie-aided rule that has known statistical information of the arms [13]. It is 
proved in [14] that for any adaptive allocation rule^ the regret is at least p,\ut when t — > oo, where the 
factor fi is determined by the statistical information of arms. A rule that achieves the lower bound of \i 
is called efficiently optimal, and a rule with regret O(lni) is called order optimal. For OS A in cognitive 
radio networks, reference [12] derives order optimal rules to well coordinate the balance between channel 
exploration and exploitation, with the assumption of perfect channel sensing. Although not efficiently 
optimal, the rules are sample mean based index rules [15], and their implementation is much simpler 
than the efficiently optimal rules given in [14]. Moreover, a regret bound is also observed with finite i§ 
in rules in [12], while no such bound is observed for finite t in the efficiently optimal rules in [14]. A 
distributed cognitive sensing problem is investigated and formulated as an adversary bandit problem in 
[16], where no statistical assumption is made on channel states. Multi-user OS A in distributed manner is 
investigated in [17], modeled as an MABP with multiple players. In the above existing research efforts 
for OSA in cognitive radio, perfect channel sensing is assumed, and each secondary user can utilize all 
observed spectrum opportunities (i.e., infinitely backlogged traffic is assumed at the secondary user). 

Unlike existing research efforts, this work explores OSA when i) imperfect channel sensing is assumed 
and ii) a secondary user has only limited "access demand" (i.e., it may not use all observed spectrum 
opportunities at a time period). Our motivation for i) is that channel sensing is always imperfect in a 
real network. And our motivation for ii) is that a user may have only limited traffic to send at a time 
period (for example, for a voice conversation)!^ Similar setup with limited access demand is adopted 
in [18]-[20]. Therefore, unlike existing OSA research where there is only one decision (i.e., to decide 
which channels to sense, and subsequently access all sensed-free channels), we have two decisions in 
the OSA in our work: to decide which channels to sense; and if a number of channels are sensed free, 
to decide which channels to access. Two cases are considered in our work: 

• Case I: when a secondary user can sense all potential channels simultaneously, referred to as full 
channel sensing; 

• Case II: when a secondary user can sense a subset of the potential channels simultaneously, referred 
to as partial channel sensing. 

Case I is investigate in Section |II1 in which we derive OSA rules and theoretically prove that they have 
asymptotically finite regrets. Case II is investigated in Section [TTll in which we derive OSA rules and 

'This means the decisions of the rule are only based on observations in the history [14]. 
2 In this paper, when we say "finite t", it means sufficiently large and finite t. 

3 Actually the case when a secondary user has unlimited access demand can be viewed as a special case of our work. 
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theoretically prove that they have regrets O(lni) with t — > oo and with finite t. Performance evaluation 
of the derived OSA rules is given in Section [IVJ followed by conclusion remarks in Section [V] 

II. Case I: with Full Channel Sensing 

Consider a slotted system, where time is partitioned into slots, and the duration of each slot is T. For 
a secondary user, there are N potential primary channels, denoted as Channels 1,2, ...,N, respectively. 
In each slot, Channel i (i G {1, 2, N}) is free (i.e., without primary activities) with probability &i, and 
9i is unknown by the secondary user. Let Si(j) = 1 and Si(j) = denote Channel i is free and busy, 
respectively, at Slot j. For each channel, the channel states (busy or free) vary independently from a slot 
to another. And the N channels have independent channel states. 

Each slot consists of a sensing period with duration r and data transmission period with duration T — t. 
For each slot, during the sensing period the secondary user senses all the N channels. Among all the 
sensed-free channels, the secondary user can access (i.e., transmit its data over) up to K channels in the 
data transmission period. For each accessed channel, the transmission rate is denoted B. 

During the sensing in Slot j, denote X(j) = (X\(j), X2(j), Ajv(j)) as the sensing result of the N 
channels, where Xi(j) = 1 and Aj(j) = mean Channel i is sensed to be free and busy, respectively. 
Since sensing errors are inevitable, we let P^ denote the detection probability of Channel i (i.e., the 
probability of detecting the primary user activity if there is primary user activity), and Pj denote the 
false-alarm probability of Channel i (i.e., the probability of mistakenly estimating that the primary user 
is active when there is actually no primary user activity). 

Since the secondary user senses all the N channels, the only decision of the secondary user to make is 
on which channel(s) to access based on its sensing result. To protect primary users, only channels sensed 
free can be accessed. Since primary users' statistical information = {61,62, On) is unknown, online 
learning is needed for the secondary user to estimate 0. In the following, we first investigate the situation 
of single channel access (i.e., K = 1, the secondary user can or need to access only one channel at a 
slot), and subsequently extend the research result to the situation of multiple channel access (i.e., K > 2, 
the secondary user can or need to access more than one channel simultaneously at a slot). 

A. Single Channel Access at a Slot (K = 1 ) 

To evaluate the performance of a channel access rule, we use the performance of a genie-aided rule (in 
which the channel statistical information is known) as a benchmark for comparison. Until Slot t, the 
expected reward, defined as the total number of bits transmitted by the secondary user, of the genie-aided 

where Z(j) denotes the set of channels 



max.E[Siti)\Xi(j) 



t 

rule is given as Yl B{T — t)E 

i=i 

sensed free at Slot j, and E[-} denotes expectation. In the reward expression, the outer expectation is for 
T(j), and the inner expectation is for Si(j). 
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For any adaptive allocation rule denoted if), where if} (J) = i means Channel i is decided to be accessed 

t N 

at Slot j, the expected reward until Slot t is Yl B ( T ~ T ) XX 1 ~ PfWiProb(ifj(j) = i), where Prob(-) 

j=l i=l 

means probability of an event. 

The regret (also the learning loss) of rule if; until Slot t, defined as the difference between the expected 
rewards of tp and the genie-aided rule, is given as 

t r -| t N 

R{t,xf>) = Y,B(T-T)E wax.E[Si{j)\X i {j) = \] - V^(T-r) V(l-P})^Prob(^(j) = i). (1) 



j=l i=l 



Since the secondary user can sense all the channels before selecting a channel to access, the channel 
access process can be modeled as an MABP with side observation [21]. For an MABP, it is extremely 
hard to derive an optimal channel access strategy such that the regret is minimized. Therefore, researchers 
instead focus on regret bound in asymptotic sense. For example, in [12], asymptotically order optimal 
rules are derived such that the regret is OQnt) when t — > oo. In our research, we also focus on channel 
access rule with good asymptotic performance such as asymptotically finite regret. Note that for two- 
armed bandit problem with side observation, reference [21] gives a rule with asymptotically finite regret 
under direct information setting. In our work, we derive a rule with asymptotically finite regret for our 
multi-armed bandit problem with side observation, as follows. 

For sensing of the N channels, we have 2 N possible combinations of the sensing result. Denote U as 
the set of the 2 N possible combinations. For each u £ U, at each slot (say Slot t) we keep a record of 
L u , which denotes the rate of u as the sensing result, given as the ratio of the number of slots in which 
u is the sensing result to t. Also define P® f as the probability that u is the sensing result at a slot, 
which is numerically calculated assuming that 0^ is the vector of free probabilities of the N channels. 
Our proposed channel access rule is shown in Algorithm [TJ 

Algorithm 1 Single Channel Access with Full Channel Sensing at Slot t 
1: Sense N channels, obtain sensing result X(i), and update L u , u G U. 
2: Construct candidate set C(t) of the form 



C(t) 



3: Arbitrarily pick up G C(t), and calculate conditionally expected reward B(T — 
t)E [Si(t)\Xi(t) = 1] (i € 1-(t)) by using as the vector of channel free probabilities. Here Z{t) 
denotes the set of channels sensed free at Slot t. 
4: if Z(t) is empty then 

Do not access any channel at Slot t. 
else 

Access Channel i* = argmaxE [Si(t)\Xi(t) = 1]. 
iez(t) 
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Theorem 1: Algorithm Q] achieves asymptotically finite regret; that is, lim sup < oo. 

t— >oo 

Proof: See Appendix U ■ 
Theorem Q] indicates that the performance of Algorithm Q] is surprisingly good through full channel 

sensing prior to channel access. As a comparison, in the rules derived in [12] where the secondary user 

senses one channel with perfect sensing, performance of R(t) ~ O(lni) is achieved, which means the 

regret goes to infinity when t — > oo. 

Algorithm Q] suffers from high complexity in the construction of candidate set C(t) in each slot. To 

reduce complexity, an alternative channel access rule with linear complexity is introduced, as given in 

Algorithm |2] 

Algorithm 2 Single Channel Access with Full Channel Sensing at Slot t 
1: Sense N channels, and obtain sensing result X(t). 

f E Xi(j)+p*-i 

2: Estimate the free probability of Channel i (i € {1, 2, N}) to be 6i(t) = 3-1 pi _ pi . 

d f 

3: Calculate conditionally expected rewards B(T - t)E [Si{t)\Xi{t) = 1], i € T{t), by using &(t) = 
(9i(t), 02(t), 0N{t)) as the vector of channel free probabilities. Here T{t) denotes the set of 
channels sensed free at Slot t. 

4: if l(t) is empty then 

5: Do not access any channel at Slot t. 

6: else 

7: Access Channel i* = argmaxE 1 [Si(t)\Xi(t) = 1]. 

iez(t) 

Theorem 2: Algorithm |2] achieves asymptotically finite regret. 

Proof: See Appendix HO ■ 

B. Multiple Channel Access at a Slot ( K > 1 ) 

Assume the secondary user can simultaneously access up to K{> 1) channels at a slot. Therefore, 
if the number of channels sensed free at a slot is less than or equal to K, then all those sensed-free 
channels are accessed by the secondary user; otherwise, K channels are selected among the sensed-free 
channels to be accessed by the secondary user. 

We still use the performance of a genie-aided rule with known as a benchmark for comparison. 

Until Slot t, the expected reward of the genie-aided rule is given as 

t 

^B(T-t)e\ max V E[Si(j)\Xi(j) = 1] 

1=1 ieK(j) 
where X(j) denotes the set of channels sensed free at Slot j and )C(j) denotes the set of channels to be 
accessed at Slot j. 

For any adaptive allocation rule ^ for multiply channel access, where denotes the set of channels 

t N 

to be accessed at Slot j, the expected reward until Slot t is ^ B(T - r) ^ (1 - Pi)#jProb(i <E *(j)). 

j=l i=l 
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t r 

The regret of rule * is given as R(t, *) = E £(T - t)£ max Y] E[Si(j)\Xi(j) = 

j=l l K(j)(ZT(j),\tC(j)\<K ielc ^ 

-, t N 

1] - E B(r -t)E(1- P))^Prob(i e (j)). 

J i=l i=l 

For multiple channel access, we modify Step 7 in Algorithm Q] and Algorithm |2] as follows: if \T(f)\ < 
K, then access all channels in X{t)\ otherwise, among all the channels in T{t), access the K channels 
with the largest K values of E [Si(t)\Xi(t) = 1]. It can be proved that the resulted algorithms have 
asymptotically finite regret. The proofs are similar to those of Theorems Q] and |2 and are omitted here. 

III. Case II: with Partial Channel Sensing 

Still consider N channels. At a slot, the secondary user can sense M(< N) of them and can access 
up to K (< M) channels among the sensed-free channels. Therefore, we have a bi-level MABP: the first 
level is to decide which M channels to sense; and the second level is to decide, among the sensed-free 
channels, which up to K channels to access. The arms played in the two levels are different, which makes 
the problem much more challenging than classical MABP. To the best of our knowledge, a general bi- 
level MABP is still an open problem. In the following, we provide solutions to our particular bi-level 
MABP. Possible extension of our solutions to a more general bi-level MABP is to be investigated in our 
future work. 

Unlike Case I where we have common channel access rules for homogeneous sensing (i.e., P % d = P^, 
PI = Pf, Vz G {1, 2, N}) and heterogeneous sensing (i.e., for each channel, say Channel i, we have 
distinct setting {P d ,Pj}), the homogeneous sensing and heterogeneous sensing need to be treated in 
different ways in Case II, as discussed in Section IIII-AI and IIII-BI respectively. 

A. Homogeneous Sensing 

Consider P l d = P^, Pj = Pf, Vi £ {1, 2, N}. Without loss of generality, we assume B\ > 62 > 

... > e N . 

We still use the performance of a genie-aided rule as a benchmark for comparison. It can be proved that 
the genie-aided rule should always sense Ai* = {1, 2, M}. So until Slot t, the expected reward of the 

i 

genie-aided rule is given as U*(t) = E E 

i=i 



)C(j)cZm*(j),\!C(j)\<k ieK ^ 



where Xm*(j) denotes the set of sensed-free channels at Slot j if the channels in A4* are sensed, and 
JC(j) denotes the set of channels to access at Slot j. 

In the following, we investigate single channel access (K = 1) and multiple channel access (K > 1), 
respectively. 

1) Single Channel Access at a slot (K = 1): The expected reward of the genie-aided rule until Slot 
t is: 



B(T-t) max E [Si(j)\Xi(j) = 1] 



(2) 
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Compared with the genie-aided rule, regret of a single channel access rule 0, in which <f>(j) denotes the 
channel to be accessed at Slot j, is given as 

t N 

R(t,4>) = U*(t) - J2 B ( T ~ T )5> " Ppi^ob{4>{j) = i). (3) 

3=1 i=l 

Unlike Case I in Section [TTJ we cannot expect asymptotically finite regret R(t). The reason is as 
follows. For partial channel sensing, consider a perfect scenario in which all sensed-free channels are 
to be accessed and all sensings are perfect. It is shown in Theorem 3.1 in [14] and Lemma 2 in [12] 
that the perfect scenario has a lower bound of O(lnt) on R(t) as t — > oo. It can be proved (the proof is 
omitted due to space limit) that, if the perfect scenario has regret C In t where C is a constant, then our 
research problem has regret at least D In t where D is a constant. 

Note that references [14] and [15] give rules with regret O(lnt) when t — > oo. However, performance 
of the rules with finite t is still unclear. In the following, using the UCB 1 (here UCB stands for Upper 
Confidence Bound) in [22], we derive a channel sensing and access rule that has regret R(t) ~ O(lnt) 
with t — > oo and with finite t. Note that the original UCB 1 cannot be directly applied to our research 
problem, because, if it is directly applied, there is only one decision, i.e., which channels to sense at a 
slot. Since in our research problem there are two decisions (which channels to sense, and which channel 
to access among the sensed-free channels), we have necessary extensions to the original UCB1. 

At each slot (say Slot t), the secondary user keeps records T(i) = (Ti(t),T 2 (t), ...,T/v(i)) and Y(t) = 
(Yi(t), Y2(t), Yzv(£)), where Tj(i) is the number of slots in which Channel i has been sensed until 
Slot t, and Yi is the number of slots in which Channel i has been sensed free until Slot t. The proposed 
channel sensing and access rule is given in Algorithm [3] 

Algorithm 3 Single Channel Access with Homogeneous Sensing in Case II (Partial Channel Sensing) 

1: Sense all N channels by using 1"-^] slots (where [■] is a ceiling function). At each slot, randomly 

select one sensed-free channel to access. Update T and Y at each slot. 
2: for each subsequent Slot t do 

3: Estimate 6i (i = 1,2, ...,N) by 6i(t) = Tiit p ^_ Pi , and determine channel set A4(t) to sense, 

which includes channels with the M largest indexes §i(t) + pJ_ P[ yj ^^t-i) • 
4: Sense channels in M.(t). Let X(t) denote the set of sensed-free channels. Update T(t) and Y(t). 
5: if X(t) is nonempty then 

6: Access Channel i* = argmax + pjip f \[~ T\(t-i) \ 

7: else 

8: Do not access any channel at Slot t. 

Theorem 3: The regret R(t) of Algorithm [3] is O(lnt) with t — > oo and with finite t. 

Proof: See Appendix HTT1 ■ 



7 



2 ) Multiple Channel Access at a slot (K > 1): When the secondary user can simultaneously access 
K channels at a slot, we modify Algorithm [3] as follows: in Step 6, instead of accessing a single channel, 
the secondary user selects up to K channels in X(t) with the largest values of §i(t) + p d ^ip s \f 2 T\(t-i) • 
Similar to proof of Theorem |3l it can be proved that the regret of the resulted rule is O(lni) for finite t 
and for t — > oo. 

B. Heterogenous Sensing 

Consider that Channel i (i = 1,...,N) has distinct setting jpj, Pj\- The genie-aided rule with known 
channel statistics is still used as a benchmark of performance. 

When channel statistics is unknown, it is desired to find a rule of good performance on regret R(t) 
under heterogenous sensing. Then a question is raised: can we find a similar rule to those in Section 
IIII- A I with R(t) ~ 0(\nt) for finite t and for t — > oo? To answer this question, we first look into the 
insights in the rules in Section UlI-AI 

As aforementioned, in Case II (partial channel sensing), there are two levels of MABP : the first level 
is to select which channels to sense, i.e., to select channel set M to maximize 



E 



B{ ?- T K<^ T ^ rn „ E E[S i (j)\X i (j) = l] 



while the second level is to select which channels to access, i.e., to select sensed-free channels with 
the largest E [Si(j)\Xi(j) = 1]. With homogeneous sensing, the criterion in the first level is simplified 
to finding the M channels with M largest 6*j's, while the criterion in the second level is simplified to, 
among sensed-free channels, finding up to K channels with the largest #j's. Therefore, in Algorithm |3l 
in both levels we use sample mean of sensing results of each channel, which can be used to estimate 
Oi. On the other hand, with heterogeneous sensing, the criteria in the two levels cannot be simplified to 
finding channels with the largest #j's. Therefore, it is not feasible to use sample mean of sensing results 
as Algorithm [3] does. Rather, we need samples to reflect reward of each arm in each level, as shown in 
the following. 

1) Single Channel Access at a Slot (K = 1): Since the secondary user can sense M channels at a slot, 
the secondary user can sense one from (^) possible sets of M channels, denoted Mi, M.2, ■■■,M(n\. In 
set Mi (i = 1, 2, (^)), let rriij (j = 1, 2, M) denote the jth channel in Mi. If the secondary user 
senses set Mi at Slot t, let I_\4.(t) represent the sensing result, which is the set of sensed-free channels. 
Until Slot t, let Tj(t) denote the number of time slots in which Mi is sensed, and Yi(t) denote the 
cumulative reward of the slots in which Mi is sensed. Until Slot t, let Tij(t) (j = 1, 2, M) denote 
the number of slots in which Mi is sensed and subsequently Channel rriij is accessed, and Yij(t) 
denote the cumulative reward of Channel m; j in time slots in which Mi is sensed and subsequently 



8 



Channel my is accessed. Note that when we say "reward", it means the secondary user transmits over a 
channel, and receives ACK for the transmission. If no ACK is received, the reward of the corresponding 
transmission is 0. The proposed channel sensing and access rule is given in Algorithm |4] The secondary 
user keeps records of Tj(i), Yi(t), Tij(t), and Yij(t). In the sequel, for simplicity of presentation, the 
index (t) may be omitted for Tj(t), Yi(t), Tij(t), and Yij(t). 

Algorithm 4 Single Channel Access with Heterogeneous Sensing in Case II (Partial Channel Sensing) 
1: for i = 1 : (jjj) do 

2: Keep sensing M.% in continuous slots, and at each slot access one free channel that was not accessed 
before when Aii is sensed. This procedure is repeated until each channel in M.i has been accessed 
at least once. For each slot, update Tj, Yi, Ti j, and Yi j, j = 1, 2..., M. 

3: for each subsequent Slot t do 



Calculate indexes |f + J 21n ^ 1 - > (i G {1, 2, (^)j), and choose ft = arg max + 

4=1 '---'(m) 



21n(t-l) 1 
Tj J " 

5: Sense channels in M; 



6: if %m f (t), the set of sensed-free channels at Slot t, is nonempty then 
7: Calculate indexes + l 21 ^ -, m^t , G Im., (t). 

i t . i V it ..7 



Select = arg max j^- 12 - + Z 21 ^* access Channel mjt j t, and check whether the 

m lt j eXAi it W L ,tj v 



transmission is successful. 
Update T iU Y iU T^t, Y# 
else 

Update Tj t . 



Theorem 4: The regret of Algorithm 0] is O(lnt) with t — > oo and with finite t. 

Proof: See Appendix IIVI ■ 

2 ) Multiple Channel Access at a Slot ( K > 1): When the secondary user can simultaneously access up 
to K channels at a slot, we modify Algorithm [4] as follows: In Steps 8 and 9, the secondary user selects 
to access up to K sensed-idle channels with the largest values of _|_ y/ ESOJ mjtj G Ijk , t (t), 
and updates Tjt j and Y^tj accordingly if Channel m^t j is accessed. Similarly, it can be proved that the 
regret of the resulted rule is 0(lnt) with finite t and with t — > oo. 

IV. Performance Evaluation 

We use Monte-Carlo simulation to validate our analysis. Consider a cognitive radio network with 
N = 8 primary channels whose free probabilities are given as 0.9, 0.8, 0.657, 0.564, 0.5, 0.456, 0.404, 0.34 
for the 8 channels in our simulation. For homogenous sensing we have Pd = 0.8 and Pf = 0.3, 
while in heterogenous sensing we have (Pj, Pj, Pj) = (0.8,0.8,0.7,0.75,0.9,0.67,0.85, 0.8), and 
(Pj,Pf,...,Pf) = (0.3,0.3,0.2,0.25,0.36,0.15,0.32,0.3). We also normalize B(T - r) = 1. 

Case I with full channel sensing is evaluated first. Figs. [Hand |2] show the average regret of Algorithm 
Q] with homogeneous sensing and heterogeneous sensing, respectively, while Figs. [3] and |4] show the 
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average regret of Algorithm [2] with homogeneous sensing and heterogeneous sensing, respectively. From 
the figures it can be seen that when t is large, R(t) tends to be finitely bounded, which is consistent 
with our analysis in Section HH Note that, due to complexity of Algorithm [Q Figs. Q] and |2] are average 
over only 100 simulation runs, and thus, the regret R(t) does not always increase in the two figures. 
Interestingly, in Figs. [3] and 01 the R(t) increases when K changes from 1 to 3, and R(t) decreases when 
K further changes to 5 and 7. This can be explained as follows. When K = 1, the false access (i.e., 
the proposed rule does not access the same channel as the genie-aided rule does) is only on one single 
channel. When K changes to 3, the false access is on up to 3 channels, and thus, the reward loss is 
likely to be larger than that with K = 1. When K further increases, the up to K channels selected by 
the proposed rule and the up to K channels selected by the genie-aided rule are likely to be with minor 
difference, and thus, the reward loss is reduced. When K = 8 in our example, there is no difference 
between the channels selected by our proposed rule and the channels selected by the genie-aided rule, 
which means the reward loss is 0. 

Case II with partial channel sensing is then evaluated. Figs. [5] and [6] show average R(t)/lnt in homo- 
geneous sensing with the proposed single channel access and multiple channel access rules, respectively, 
while Figs. |7] and [8] show average R(t)/\nt in heterogeneous sensing with the proposed single channel 
access and multiple channel access rules, respectively. It can be seen from the four figures that when t 
is large, average R(t)/lnt tends to be finitely bounded, which is consistent with our claim in Section [TTT1 
that R(t) ~ O(lni). 

V. Conclusion 

In this paper, the problem of dynamic channel sensing and access by a secondary user in a cognitive 
radio network is investigated. In the case with full channel sensing, with side information through 
sensing all the channels, the regret due to unknown primary users' statistical information is proved to be 
asymptotically finite. On the other hand, for the case with partial channel sensing, asymptotically finite 
regret cannot be achieved since it is proved that the regret is at least 0(m t). Therefore, in our research we 
derive channel sensing and access rules with regret O(lnt), for homogeneous sensing and heterogeneous 
sensing, respectively. This research should provide insights to the design of OSA in cognitive radio 
networks with unknown statistical information of primary channels. Further research may include the 
case with competition among multiple secondary users and the generalization of our solutions in Case II 
to a more general bi-level MABP. 

Appendix I 
Proof of Theorem Q] 

Recall that is the vector of real channel free probabilities, and in Step 3 of Algorithm [Q is used 
to estimate 0. With sensing result X(i) at Slot t, denote /c©(X(i)) and fc @ (X(t)) as the best channel 
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which has the largest reward when and are used as channel availability statistics, respectively. 
By following Algorithm [Q the probability of false access (i.e. access a suboptimal channel) is 

Prob (Jfe 6 (X(t)) ^ fce(X(t))) < Prob (3u G U, k&(u) ^ k & {u)) . (4) 

Define a set C e = {©' : 3u G W, k@>(u) / fce(ti)}. Then (@]) is equivalent to 

Prob (fc 6 (X(t)) / fc e (X(*))) < Prob (© G C e ) . (5) 



Define e = ^inf ^jY^u&u^u' ~ Pj?) 2 - Then we have e > (the proof for this is omitted due to space 
limit). 



We first consider an event { A j Y (^r? ~~ -^n) 2 < § ( happens. From Algorithm [T] we have 

new 



y new J y new y new 

When t is large enough such that ~ < ~, from ([6]) we have 



(6) 



y new y new y new 

which means ®(t) ^ C e from the definition of e. It also means that, if ®(t) G C e , then we should have 

£ (P® - L u ) 2 > |. Then we have 

new 

Prob (@ G C e J < Prob ( /^(P® - L u f > ^ I < a(i) = (t + 1) 2 e <^W«^» (8 ) 
where the second inequality comes from the Sanov Theorem (i.e., Theorem 2.1.10) in [23], and B denotes 



a vector space < {L' u } uf zu : / ^ (P® - L' u ) 2 > | >, which is closed. 
[ y new J 

For the exponent in the expression of a(t), we have 

E L 'u HL'jr?) = E ( p u^ HL'JP®)) > f E p u^) * ( E p « e 4) = (9) 

new new v n y Vnew u / Vnew u ) 

where the inequality comes from the Jensen's inequality and the fact that xlnx is a convex function. 

In addition, Yl L' u \n(L' u / P®) is continuous and strictly convex, which, together with e > and ©, 

new 

leads to inf Yl L' u \n.{L' u / P®) > 0- And thus, from the definition of a(t) given in dD, we have 

l im ^+11 < 1. 
t ^oo a(t) 
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From © and ©, we have Prob (k^ (X(i)) / fc e (X(*))) < a(t) when ± < §. So for regret R(t) of 
Algorithm [TJ we have 

LfJ t 

limsup i?(i) < c V Prob (X(j)) + k&(K(j))) + co Km V Prob (X(j)) / MX(j))) 
t— >oo — : t->oo «— -* 

J =1 i=L?J+i 



< c 



3 * 

+ c Urn V a(j) < oo (10) 



i— >oo 

j=LM+i 



where |_-J is a floor function, cq denotes the largest possible reward loss due to false access in a slot, 



a(t+l) 
a{t) 

Therefore, by following Algorithm Q] asymptotically finite regret is achieved. 



which is finite, and the last inequality comes from lim v ,., < 1 



Appendix II 
Proof of Theorem [2] 

For Algorithm |2j the probability of false access is calculated as 

Prob (k^ t) (X(t)) ^ k & (X(t))) = £ (k$ (t) (u) + M«)) Prob (X(t) = «) (11) 

in which 

/ \ / (1 — JF7)fli(t) (l--Pf)i 
Prob k&,.-, (u) / A;® (it) = Prob arg max -r- 1 — r — / arg max — \ 

v u ' \ iex„ /(0i(t)) -^J 



< E ftBb ( P- J f^w > ( 1 - J f > (12) 

where X u is the set of sensed-free channels when the sensing result is X(t) = u, f(6i) = (1 — Pj)9i + 

(1 - Pj)(l - 9i), and (vr(l), tt(2), n(N)) is a permutation of (1,2, ...,JV) such that ^— S-^y^ > 

(i-p; (2) )0, (2) (i_ P ;W)^ (w) 
/(^ (2) ) ^ "• ^ /(^<«>) ■ 

First consider homogeneous sensing when P % d = Pj and PI = Pf, i G {1, 2, N}. Without loss of 
generality, assume {9\ > 62 > ... > On}- Then (fT2l is simplified as 

Prob (fc 6(t) (u) / fc e («)) < J] Prob (^(t) > • (13) 

JE*(i)+M 

According to Algorithm |2l we have 9{(t) = J ~ 1 p _ p to estimate 0j. We denote the sum of 

t 

sensing samples until Slot t for Channel i (i = 1,2, ...,N) as X\ = ^2 Xi(j). So X\, X2, X f N are 

i=i 

independent binomial random variables with parameters /(#i), / (#2 ),•■•, /(0/v)> respectively. When f is 
large enough (say t > to), the binomial distribution of X\ can be approximated as a normal distribution 
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with mean tf(6j) and variance f(Oi)). We use g x t to denote the probability density function 

of X\, which follows a normal distribution. Then for the term in the summation in (TT3T ). we have 

9x t (y) / 9xt(x)dxdy 

-oo J y 

tf{9i) r+oo r+oo r+oo 

9xi(y) / 9xi(x)dxdy+ / 9xi(y) / 9x\ 0) dxdy. (14) 

oo Jy Jtf(9i) Jy 

The two terms on the right hand side of (fT4l have the following upper bounds. 

(15) 

where the second inequality comes from the Chernoff bound. Here Q(-) is the Q-function given as 

v 2 

+00 r+oo 

9x{{y) I 9xt(x)dxdy 
tf(0i) Jy 

f +OC 1 (v-f(D k )t) 2 fa-/(flj)*) 2 

< / — p 2/(e fc )(i-/(e fc ))t e 2/(«i)(i-/(9i))« city 

Ri=f(ei)(i-f(ei))t 

R k =f(e k )(i-f(e k ))t 1 /'+ 00 1 „H a( ,-/( afc ) t ) 2 +fi fc ( !; -/( a ,) t ) 2 

= - / — =e 2R ' R fc d-y 

2 y t/(fli) y/2itf{0 k )(l-f(p k ))t 

1 1 n^r ^ ( " + " — 



2 v^Rfc \ Ri + Rk \ I RA 

Ri+R k 



1 / Z? . (flj-Qfc) 2 ^ 1 / E>. _ (H-e k r t 

<-,/ - e 2 ( n .+ R fc) = -a - e lU'^ii^-f-OiV+fCkXi-fCkV) (16) 

-4yRi + R k A]/ Ri + R k 

where the two inequalities are from the Chernoff bound. 

From CD and U3-^TB, we can bound the false access probability, for Slot t when t > to, as 

Prob (% (t) (X(t)) ^ fc© (X(f))) < ^ Prob > fc (i)) Prob (X(t) = u) 

uEU i>k,i&X u ,k£T u 

_ /l (/if fc )-/("i)) 2 t 1 / 15\ (Oj-o k ) 2 A 

< 2 y 2 , ( 2 e 2/(£,i:)(1 ~ /(efc>) +4Y^ — ^T e 2(/(8 i )(1_/(e i ))+/(fl * )(1 - /(e fc ))) 1 Prob(X(t) = u) 



< C\e 2 , where ci = , C2 = min 



2 



*>fc \ 2f(0 k ) (1 - /(0 fc )) ' 2(/(0 i )(l - M)) + /(W - /(**))) 
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Then for regret R(t) of Algorithm |2l we have 



limsup.R(i) < lim sup V] c Prob U@ m (X(j)) ^ k@ (X(j)) 

to t 

< £ c Prob (X(j)) / fee (X(j))) + limsup £ c Prob (X(j)) / fc e (X(j)) 

j = l J=to + l 

CO 

< coio + ^] CQC\e~ c ^ < oo 

j=io+l 

where cq denotes the largest possible reward loss by accessing a false channel (i.e., a suboptimal channel) 
at a slot. 

Then consider heterogenous sensing when we do not have P\ = P d and Pj = Pf, i £ {1, 2, N}. 
Without loss of generality, assume > ... > - f^dJ) N ■ Then (|T2T> is rewritten as 



Pro, ( W „, * M»)) < j>t E ^ ( ™ > <™) 

If Prob \ K X i X k {pr^pj - > t ^ p ,_ p , X k pk _ pk A- 



Prob (Xpr* 



(17) 



where the second line comes from Oi{t) 



Pl-P} Pl-P] 



(TTtT) can be rewritten as 

Prob («) / k & (u)) < Z Prob > td x X{ - td 2 X\) 

( td 2 x \ 
$S° 9x* (x) /t£_ 9xi (f ) d 2/ dx + /-t> ^x* 0) /-ST 9xi (y) dydx J . 



(18) 



In order to get a bound of Prob I k^^(u) ^ k@{u)j, next we derive the bounds for the two terms in 

. . i—p i 1—P k 

the summation in the last line in (fl8l ). Without loss of generality, we assume — — L; > 0, while 

1— P i 1—P k 1—P i X—Pt 

scenario with pi _L — pk _ pk < can be similarly proved. Note that when pz _ pi — pk _ pk = 0, similar 

d f d f d f d f 

way to that in the homogenous sensing can be used to derive a bound of Prob ykQ^iu) / k@(u) 

\ 1 (d 1 ~f(e i » 2 . 

. < — e 2 /Ci)(i~/( 8 i)) 



/ 9xi{x) g xi (y)dydx< g x t (x)dx = Q [—==== 

J tdl Jj±2±. J tdl V JfiOMl-fU 



where the last inequality comes from the Chernoff bound, in which the following fact is used: 



(1-Pl)(l-Pj) / / 1-Pi 1-Pf\ (1- Pi)(l- Pi, 
= \ fJ\ d) / _ f_ _ / J . . d \ = l-P}> f(9 i 

{Pl-P}) I \P l d -P} P k d -P)) \-P)-{P\-P)) 
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The second term in the summation in the last line in (fT8l ) is decomposed into two sub-terms: 



/td\ — x 
9x\ (y) dydx 
-oo 

'MM 



td 2 x td 2 x 

/td 1 ~a: f d 2 + f(B k ) f tdj^-x 

9xi(y)dydx+ / 9x\{x) \ g x t k (y)dydx. (19) 

-oo J — oo J — oo 



d 2 + /(»fc) 

The first sub-term in ( fT9l is bounded as 

9x<( x ) 1 9x k {y)dydx< \ g x t(x)dx < / g x t(x)dx 

<*2+/(«fc) d 2 + f(0 k ) d 2 + f(S k ) 

= Q V 2+A U 7 < -e V(»«)lWl»i» * (20) 

\ vfFfi y " 2 

where the last inequality comes from the Chernoff bound. In the derivation of the last inequality in (l20l ). 
we should have < JJ+ffol) f° r * > k- This is satisfied from the following fact 

/ i - pi i - pi \ ( l — pi ,\ 

mm + /(**)) [pi-pi - pj^ij) = m [ m) i«=F* " 9k{1 " P/) J 

1 - P) - P})\ / 1 - P) 1 - Pj? 



(1-P})0i (1-Pf 



where the first equality comes from the definition of di, the inequality comes from — <^ — 70k) 

for i > k, and the last equality comes from the definition of d\. 
The second sub-term in (fT9l) is bounded as 



tf(»k) d l td 2* 

/ " 9xt( x ) / 5x*(y)dj/dx 

J — DC J —OO 

Ri=m)(i-m))t tf(e )d t . 2 

Rk=f(6 h ){l-f(6 k ))t f d 2 + ho k ) 1 _( £= mW^ {Td^'f-o^) 

< / — ; ■ e 2R i e 2R * dx 

, , tf(e fc )di , ( d 2 + f(e k ) \ 2 

(») f d 2 + f { e k ) 1 P-/(»i)*) 3 I gj —^^"J 

< / — e 2n i e 2H fc da; 



A=(d 2 +f(9 k )) 2 /d 2 1 t/(fl fc )di 

i/=t/(e fc )<*i/(d2+/(ei0) / d 2+/(f fc ) 1 _«,(x- t / (8 ))-+ ai A ( .-ff) 

= / : P, 2R i R k 



-OO 



dx 



{Rk+n . l A ) ( Rkt if i e t f +Rt AHi)-(i lktJ( e l)+n . l AHf r^f^j j ggggffi A " ) 2 

J-oc 2^^ 

i / O («fc+-R,^)(«fct 2 /(»i) 2 + Hi^ff 2 )-(«fc*/(«,) + Ri^«) 2 f R k tf(6i)+RiAH tf{0 ) 

' (H f .+fi i A)2H i J?. fc Q I Rk + RiA 



2 V Rk + RzA \ I RiRk 



J / d (H fc +B i A)(fl fc t 2 /(8 i ) 2 +fl i AJf 2 )-(fl fc t/(fl i ) + H i Aff) 2 + ^t/(fl i )- Aff ^ 2 ( Hfc + Hi A) 2 

" 4 V R k + R t A 

1 / fii t 2 (f(ei)) 2 («fc+fij^) 1 / Wi (/C?,)) 2 (/(g fc )(i-/(a fc ))+/(«,)(!- J(f,))^) . 

< _,/ 2^ e -zRiRk = -\ e 2/<e i )(i-/(e i ))/(£> fc )(i-/(£) fc )) 

" 4 V i?fc + i?,A 4 V i? fc + i? l ^4 



(21) 
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where (a) comes from the fact that for x E (-00, ^fmh ], we have < d2+ ^ k) x < tf(9 k ), (6) 

comes from the fact that since f(6i) < d\ we have tf(9i) < H = tf{9k)d\/(d2 + /(#&))> and other 
inequalities come from the Chernoff bound. 

From (fTTb and (fT8T>-(f2TT>. we can bound the false access probability, for Slot t when t > t , as 

Prob (fe d(t) (X(i)) ^ fc @ (X(t))) < ^ ^ Prob (fc d(t) (u) ± M")) Prob (X(i) = «) 

uSW i>k,ieX u ,k£X u 

/. ■> , f ■ f <"fc>' i i fee 

^-^ ^-^ / 1 (dl-/(f,)) 2 f 1 U2 + /W J ' 'V f 

< 2^ \ ~ e 2/(f>i)(1 ~ /(( ' i)) H — e 2 /<»i)< i -/<«i)) 



new i>k,iei u ,kex u 



1 / i?l /(9i) 2 (/("fcHW(gfc)) + /(gi)(W(gj)M) \ 

+ -W — -e a/(«i)(i-/(»i))/Cf*)(i-/(8fc)) Prob (X(i) = u) 

4 V -Rfc + i?i^4 / 



< c 3 e 



-c 4 t 



_5/|WK r ,_ min j (di-/(QQ) a (jrw-^)) 2 /(e,) 2 (/(e fc )(i-/(g t ))+/(e,)(i-/(e,))A) 1 n 
wnere c 3 - 4 ^ 2 c 4 -mm<^ 2/(eo(1 _ /(e0) , 2/(00(1-/(6,)) ' 2/(«0(W(9i))M)(W(««.)) I 



i>fc 

Therefore, for regret i?(i) of Algorithm |2l we have 

i 

limsup 12(t) < limsup Vc Prob (*; @m (X(j)) ^ fe© (X(j)) 

to t 

< J> Prob (k m (X(j)) ^ A; e (X(j))) + limsup £ c oPr°b (k^ {j) (X(j)) ^ k & (X(j)) 
3=1 j=t +i 

00 

< c t + c oCse~ cd < 00. (22) 

J=to+l 

Appendix III 
Proof of Theorem [3] 

Recall that we assume 6\ > 62 > ... > On, and for the genie-aided rule, M* = {1,2, ...,M} 
is the optimal set of channels to sense. Then for any rule, the expected reward loss in a slot (say 
Slot j) is bounded by the maximal expected reward of the genie-aided rule in the slot, given as A = 
B(T - t)E[ max 9i ^ f) Xi(j)] , where /(0 f ) = (1 - Pj)9i + (1 - Pj)(l - 0*) is the probability that 
Channel i is sensed free. Throughout our proofs, I^y is an indicator function for an event A. 

Recall that in Algorithm [3j M.(j) denotes the set of channels to sense at Slot j. So until Slot t, the 

regret R(t) of Algorithm [3] is bounded as 

t 



i?(i)<A$>[/ {M(j) ^. } ] 



(23) 
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where Xm*(j) denotes sensed-free channels in Slot j when channels in Ai* are sensed. On the right 
hand side of (l23l ). the first term is the regret bound when the secondary user does not select exactly M* 
to sense (i.e., A4(j) 7^ M.*), and the second term is the regret bound when the secondary user senses 
channels in A4* but does not select the best sensed-free channel to access. 

In the sequel of this proof, for Slot j, denote 9^(Tk{j — 1)) as the estimated free probability of Channel 
k, as described in Algorithm |3l when Channel k has been sensed by T^(j — 1) slots until Slot j — I. 

Now we derive a bound for the first term on the right hand side of (123T ). Recall that Ti(t) is the number 
of slots in which Channel i is sensed until Slot t. Then we have 

t N 
3=1 i=M+l 

Further, for M + 1 < i < N and any positive integer /, we have 

t t t 

Ti(t) = 1 + Yl hi&Mij)} = 1 + Yj I{ieM{j), T z (j-l)>l} + Yl I{i£M(j), T z {j-l)<l} 

t 

<l+ £ I {i£M(3),T i (j-X)>l} 

Mil* 1 
t 

M t-1 

< 1+ V Y I( r , r , 1 

fe _! . r«i ^ min \eZ(t 1 )+-±—.f?^}<m ax \ej (t- 2 )+— l — y 1 ^ 1 } \ 

A/ i j j 

(25) 

2]nj 

fa 
2 In j 

Pd-PjM £1 ' 



Similar to analysis in [22], we have the fact that if event 0^(ti)+p^^ J ^ < 6f(t 2 ) + 



P d -P, 



happens, then at least one of the following three events will happen: 9^ (t\) < Ok 



1 



9j{t2) >0i + p^pj\J and 9 k < 9 t + p^^^. In other words, we have 



E 



< E 



"(e T (t 2 )>6» I + — ! — 

( * V >~ p d- p f V '2 



p d - p fV '2 



(26) 



Using Chernoff-Hoeffding bound, the first two terms on the right hand side of d26l ) are bounded as 



E 



< 



(27) 
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We note that if t 2 > ( dM _ e 8 ^p d _ Pf )2 , then we always have 9 k > 9% + p d -p f yH^r ^ or an ^ ^ G ^ 



and j < t, which means I 
(l24ll- (|27l) we have 



0. Therefore, by setting I 



81nt 



9 M -0i) 2 {P d -P f ) 2 



, from 



81nt 

? M -fc) a (Pd-P/) a 



N M oo j 

+ E E E E 

i=M+l k=lj = lt!=l 



t N 

E E ihMu^M*}] < E 

j=l i=M+l 

N 

< E 

i=M+l 



To bound the second term on the right hand side of (1231) . we have 



3 

E 



2f 



8 In t 



(t>M-°i) 2 <. p d- p f) 2 



^_ F + (iV -M)(^ + l) 



(28) 



E / {M(j)=Al«}- r 
3=1 



< 1+ E I{M(j)=M*\I ( r , , n 

t 

<1+ X) E IlM(-i\=M*\l 



i<k, iykaM* 



< E Mi,fc+ E I J {A<(j)=A<*,n(j-l)>J<,fc} 



{eim 3 -i) )+ ^^^<el { T k{3 -i))+^^^} 



t j j 

— E i h,k + E E E ^/flT(- 4 1+ i /2toj < gTf t; - ) | i ./mri 



(29) 



where 1^ can be an arbitrary positive integer. 
Similar to the treatments in d26l)-(|2"8T), the second term on the right hand side of d23l ) is bounded as 



I {M{j)=M*} 1 



<Alnt E M ).; fl - P/ ). +Ag)(V + l). 

i<k&M* 

Then, from (j23j, ([28j and OO]), the regret until Slot is bounded as 



(30) 



(^ + l) + A(f) (£ + 1 



In other words, ~ O(lni) for finite t and for t — > oo. 
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Appendix IV 
Proof of Theorem [4] 

Denote M j. as the optimal set of channels to sense (i.e., the set of channels to sense in the genie-aided 
rule). Denote M(t) as the channel set decided by Algorithm [4] to be sensed at Slot t. Similar to proof 
of Theorem |3l the regret R(t) until Slot t is bounded as 



R(t)<A^E[l 

3=1 



3=1 



■ I 



E[S„ 



U 

,=l]>E[S m „ \X„ 



2 1n(j-l) 



/ 21n(j-l)~ ] 



.=1] 



(32) 



Next we derived bounds for the two terms on the right hand side of (1321 . respectively. 

Since Tj(t) is the number of slots that channel set Mi is sensed until Slot t, the first term on the right 

hand side of ® is A £ £ [/{M^Af*}] = A £ ^PK*)]. 

J'=i i 7 4i-,ie{i,2,...,(j;)} 

For each i E {1,2,..., Q^)}, it can be proved that the reward sequence Yi(t)\ T .^ = i, ^(i)lr,(t)=2» •••> 
yj(t)| T .( 4 ) =n satisfy a so-called drift conditio^. The proof is omitted due to space limit. 

Similar to the treatments in (|23T) -(|28T). we have S[Tj(i)] < ^ + ^ + 1 where 



max E [S[\Xi = 1] 



max = 1] 



and 1^4. is the set of sensed-free channels if Mi is sensed. Therefore, the first term on the right hand 
side of (l32l is bounded as 



3=1 



. } ] < Alnt 



E f 

**u O) 5 * 



+ A 



A/ 
M 



1 



7T" 



+ 1 



(33) 



Similar to the treatments in d29l- (r30l) . we have a bound for the second term on the right hand side of 

+ Aft)(£ + 1). 



33) as A hit V 



It can be seen that, the two terms on the right hand side of (1321 are bounded by 0(lni). Therefore, 
the regret until Slot t, R(t), is O(lni). 
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Fig. 1. Average regret R(t) of Algorithm [T] with homogeneous sensing in Case I (full channel sensing) 




Fig. 2. Average regret R(t) of Algorithm [7] with heterogeneous sensing in Case I (full channel sensing) 
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Fig. 5. Average R(t)/\nt of Algorithm [5] (single channel access) with homogeneous sensing in Case II (partial channel 
sensing) 




Fig. 6. Average R(t)/lnt of proposed multiple channel access rule with homogeneous sensing in Case II (partial channel 
sensing) 
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Fig. 7. Average R(t)/\nt of Algorithm [4] (single channel access) with heterogeneous sensing in Case II (partial channel 
sensing) 




Fig. 8. Average R(t)/\nt of proposed multiple channel access rule with heterogeneous sensing in Case II (partial channel 
sensing) 
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